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GOODNESS  OF  FIT  TESTS  AND  ENTROPY 
by  Emanuel  Parzen 

Department  of  Statistics,  Texas  A&M  University^ 

Dedicated  to  the  memory  of  Paruchuri  R.  Krishnaiah 

Abstract:  This  paper  discusses  the  unifying  role  of  entropy  statistics  and  concepts  in 
developing  goodness  of  fit  tests  for  a  parametric  model  F{x;  6)  for  a  continuous  distribu¬ 
tion  function  F{x),  given  a  random  sample  from  the  distribution  F.  Statistics  discussed 
are  those  introduced  by  Moran  (extended  by  Cheng  and  Stephens),  Vasicek  and  Dudewicz 
&  van  der  Meulen  (based  on  gap  estimators  of  quantile  density  function),  Parzen  (au¬ 
toregressive  estimators  of  quantile  density  functions),  and  Shapiro  and  Wilk.  They  are 
given  unified  formulations  as  entropy  difference  statistics.  Their  95%  significance  levels 
for  sample  sizes  20  and  50  are  compared  and  shown  to  increase  as  amount  of  “smoothing” 
decreases. 

1.  Introduction  to  Entropy.  This  paper  discusses  the  unifying  role  of  entropy 
concepts  in  testing  goodness  of  fit  of  a  random  sample  of  a  continuous  random  variable  X. 
The  problem  is  to  test  the  fit  of  a  parametric  model  F{x,0),  6  a  vector  of  parameters,  to 
the  true  distribution  function  F{x)  =  ProbfA  <  z)  of  AT  with  probability  density  function 
/(z)  =  F'{x). 

The  true  quantile  function  of  X  is  Q{u)  =  F~^(u).  Quantile  density  function  is 
q(u)  =  Q'(u)  =  1/ fQ{u);  the  density  quantile  function  is  fQ{u)  =  /(Q(u)).  The  entropy 
jy  of  a  random  variable  X  is 

/oo 

{-  log/(zj}/(z)<iz 

-OO 

=  /  log  q{u)  =  H{q). 

Jo 

In  general,  H{q)  can  be  any  real  number.  But  if  q{u)  integrates  to  1,  corresponding  to 
^Research  supported  by  the  U.S.Army  Research  Office 
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a  random  variable  on  the  unit  interval,  then  neg-entropy  —H{q)  is  non-negative.  The 
ntropy  statistics  for  goodness  of  fit  are  constructed  to  be  non-negative. 

The  quantile  density  function  q{u)  is  in  general  a  non-integrable  function  with  a  large 
dynamic  range.  We  always  assume  that  log  q{u)  is  integrable,  which  means  that  X  has 
finite  entropy. 

2.  Moran’s  statistic.  Assume  that  the  random  sample  consists  of  distinct  ob¬ 
servations  with  order  statistics  denoted  X(l;n)  <  ...  <  X(n;n).  The  probability  in¬ 
tegral  transform  Y  =  F{X-,d),  for  a  specified  value  of  6,  has  order  statistics  denoted 
1^(1;  n)  <  ...  <  y’(n;  n).  Let  y(0;  n)  =  0,  y(n-|-l;  n)  =  1.  Let  Di{6)  =  y  (t;  n)  — y(t  — 1;  n), 
i  =  1, . . . ,  n  -f-  1.  Cheng  and  Stephens  (1989)  define  Moran’s  statistic  to  be 

n+l 

w(»)  =  £{-iogA(«)} 

J=1 

They  study  the  asymptotic  distribution  of  M{0)  when  6  is  the  true  parameter  value,  and 
when  0  is  replaced  by  an  efficient  estimator  0".  They  illustrate  the  usefulness  of  Moran’s 
statistic  by  an  example  of  real  data  where  M{0)  correctly  rejects  the  hypothesis  that  X 
is  normal,  in  contrast  to  more  traditional  empirical  distribution  function  statistics  such  as 
the  Kolmogorov-Smirnov  and  Cramer-von  Mises  statistics  which  accept  the  hypothesis  of 
normality  for  the  sample  tested.  Our  aim  in  this  paper  is  to  provide  a  variety  of  alternatives 
to  Moran’s  statistic  by  expressing  it  as  an  entropy  statistic  and  to  discuss  how  to  generate 
entropy  statistics. 

Our  first  step  is  to  normalize  Moran’s  statistic  by  giving  it  a  new  definition;  define 

n+l 

M-(«)  =  (l/(n+l))^{-logi.(«)} 

t=l 

=  /  {— log  (r{u\0)}du 

Jo 

defining  for  *  =  1, . . . ,  n  -1- 1 

=  (n  +  l){y(f;n)  -  Y{i  -  l;n)}  =  (n  +  1)I>,(0), 
cr{u;0)  =  di{0),{i-  l)/(n-t- 1)  <  u  <  i/{n  +  1). 
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The  quantile  function  of  F  =  F{X\d)  is  D{u\6)  —  F(Q(u);0);  it  can  be  estimated  by 


D~{u\6) 


as  well  as  F(Q*'(u);fl),  where  Q~{u)  is  the  sample  quantile  function  of  the  X  sample.  An 
estimator  of  the  quantile  density  function 


d{u-6)  =  L/iu-J)  =  f{Q{u)-e)lfQ{u) 

is  d~{u\6).  We  call  d{u\B)  a  comparison  densify  function,  denoted  d{u;  F[x),F{x]0)). 

Moran’s  statistic  M~{6)  is  an  estimator  of  M{0)  =  —H{d{u\0)),  the  neg-entropy 
—  H{Y)  of  y  =  F{X,0).  When  0  is  the  true  parameter  value  Y  is  uniform  and 
H{Y)  =  0;  Cheng  and  Stephens  (1989)  show  that  M~{0)  is  asymptotically  normal  with 
mean  7  =  .57722,  Euler’s  constant.  Therefore  one  may  want  to  consider  an  unbiased 
entropy  statistic  M*{0)  =  M~{0)  -  .57722,  which  when  0  is  the  true  parameter  value  is 
asymptotically  normal  with  mean  zero  and  variance 

VAR|M-(«)l  =  (l/(n+l))(^ -1) 

Cheng  and  Stephens  (1989)  use  small  sample  corrections  of  this  aisymptotic  distribution 
theory  to  compute  significance  levels  of  M*{0);  for  example,  for  n  =  20,  Prob[M*(5)  < 
.48]  =  .95.  We  note  that  M~{0)  uses  a  least  smooth  estimator  of  d{u\0),  and  one  should 
consider  other  entropy  statistics  of  goodness  of  fit  generated  by 

M'(0)  =  /  {— logd*(u;  0)}du 

Jo 

where  d''[u\0)  is  a  smooth  estimator  of  d{u\0)  of  the  form  discussed  in  the  sequel. 

3.  Kullback  information  divergence.  The  non-negativi  quantity  M{0)  being 
estimated  by  M~{0)  is  the  neg-entropy  of  Y  =  F(X;0).  It  also  can  be  identified  to  equal 

/(/l /(•!»))  =  r  (-log(/(x;0)//(x))}/(x)dx 

J—oo 
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the  Kullback  information  divergence  btween  the  true  distribution  function  jF(i)  and  the 
parametric  model  since 

M{e)=  [  {-log(/(Q(u);fl)//g(u))}du. 

Jo 

Define  the  sample  distribution  function  F~{x)  =fraction  of  random  sample  <  x,  with 
symbolic  probability  density  f~.  Moran’s  statistic  where  6'  is  an  efficient  param¬ 
eter  estimator,  can  be  regarded  as  an  estimator  of  the  information 

divergence  between  the  data  and  the  optimal  parametric  model.  Other  entropy  statistics 
are  obtained  by  alternative  estimators  of  /*",  the  sample  to  model  information  divergence. 

Information  divergence  /(/;/(.;  0))  can  be  expressed 


defining  cross-entropy 

/oo  roo 

{-logf{x;e)}f{x)dx=  /  {-\ogf{x;0)}dF{x). 

-oo  J  — oo 

Cross-entropy  is  related  to  maximum  likelihood  estimation.  Define  the  sample  cross¬ 
entropy  between  the  parametric  m  jdel  F{x;  9)  t  nd  the  sample  distribution  function  F~{x) 
by 

n 

mnnuf))  =  -E-\\ogf(x-,f)]  =  -{i/n)j2iogf(x{tyj). 

t=i 

The  maximum  likelihood  estimator  9"  is  the  minimum  sample  cross-entropy  estimator. 
Define  for  any  ‘■oatistic  T{x) 

n  roo 

Er[T(x)\  =  (1/n)  Y,  nmh  Ei\T(x)\  =  /  r(x)/(i;  e)dx. 

(=1 

A  model  f{x\9)  is  said  to  obey  an  exponential  model  if 

k 

log/(x;«)  =  9, r,(i) -»(«). 

>=1 
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The  maximum  likelihood  estimator  of  an  exponential  model  can  be  shown  to  be  method 
of  moments  estimator;  6^  is  the  value  of  6  satisfying 

E-lT-yWI  =  E,\Ti{x)\. 

Further  the  minimum  sample  cross-entropy  equals  the  entropy  of 

k 

Hi!-, sun  =  ff(/(.:r))  =  nr)  - 

4.  Entropy  difference  goodness  of  fit  statistics  for  exponential  models.  An 
important  conclusion  can  now  be  formulated.  When  the  parametric  model  is  an  exponential 
model,  the  natural  entropy  statistic  to  test  goodness  of  fit  given  by  the  sample  to  model 
information  divergence  can  be  expressed  as  an  entropy  difference  statistic 

r-  =  -  H{r) 

and  can  be  estimated  by  H{f{-;6"))  -  H'{f)  where  H'{f)  is  an  estimator  of  H{f).  Since 
H can  be  interpreted  as  the  “maximum  entropy”  we  obtain  a  “non-negative  statis¬ 
tic”  by  the  entropy  difference  statistics  to  test  goodness  of  fit  of  a  parametric  model.  Note 
is  an  estimator  evaluated  under  the  assumption  that  f  obeys  the  null  hypothe¬ 
sis  of  belonging  to  the  parametric  family  /(i;  0),  and  H'{f)  is  a  non-parametric  evaluation 
based  on  a  smooth  non-parametric  estimator  of  the  true  density  /. 

5.  Gap  estimators.  A  basic  approach  to  estimators  H'[f)  is  to  use  the  entropy 
formula 

H[f)  -  f  \ogq{u)du 

Jo 

in  terms  of  the  quantile  density  function  g(u).  Many  approaches  are  available  to  form 
estimators  q'{u)  and  thus  estimators  H'{f)  of  the  entropy  of  the  true  probability  density 
/.  The  earliest  approach  considered  by  researchers  is  equivalent  to 

n—v 

H'[f)  =  =  (l/(n  -  2i/))  ^  \ogqu'{jl{n  +  1)) 
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where  for  j  =  u  +  1, ...  ,n  —  u 


+  1)  =  ((n  +  l)/2u){X{j  +  u;n)  -  X{j  -  u;n)} 

is  an  estimator  of  the  quantile  density  q{u)  of  X  at  u  =  j/(7i  + 1).  We  call  these  estimators 
gap  (of  order  u)  estimators;  they  were  introduced  and  studied  for  i/  =  1,2, 3, 4, 5  by  Vasicek 
(1977)  to  test  normality  and  Dudewicz  and  van  der  Muelen  (1981)  to  test  uniformity. 
Normality  is  an  example  of  a  location-scale  parametric  model 

(3(u)  =  fi  +  oQq{u) 

where  fj,  and  a  are  parameters  to  be  estimated  and  Qo(^)  is  a  known  standard  distribution 
(for  normality,  (3o(^)  =  ^~^(i‘)>  the  inverse  of  the  standard  normal  distribution  function). 
For  a  location-scale  parametric  model 

/f(/(.;n)  =  loga*  +  /f(/o). 

For  a  normal  distribution,  H{fo)  =  .5{l  -F  log  27r}  and  a'  is  the  sample  standard  deviation. 
Vasicek  (1977)  entropy  statistic  for  testing  normality  can  be  expressed 

Av  =\oga'  +  H{fo)-H{q^^). 

6.  Autoregressive  estimators.  An  alternative  approach  to  estimating  q{u)  when 
one  desires  a  goodness  of  fit  test  of  a  location  scale  parametric  model  is  to  estimate  the 
density  d(u),  0  <  u  <  1,  defined  by 

d(u)  =  (1/<7o)/oQo{«)9(«). 

where  cq  =  Jq  foQo(u)q(u)du.  We  call  d(u)  a  didi  (divided  difference)  density,  or  weighted 
spacings  density,  denoted  dd(u;  F(x),  Fo(x}).  They  provide  an  alternative  to  Q  —  Qo  plots. 
Notice  that  the  neg-entropy  of  d  satisfies 

—H{d)=  f  {— logd(u)}du 

Jo 

=  \ogao  +  H{fo)-H{q). 
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Therefore  —H{d)  is  an  entropy  difference,  and  an  estimator  provides  in  one  stroke 

an  entropy  difference  statistic  for  goodness  of  fit! 

We  assume  that  d{u),  l/d{u),  log  d[u)  are  integrable  functions.  Estimating  d{u)  rather 
than  q{u)  can  be  regarded  as  a  process  of  preflattening  the  function  to  be  estimated. 
We  currently  prefer  estimation  of  d{u)  by  kernel  estimators,  using  boundary  kernels  to 
compensate  for  end  effects  at  0  and  1,  or  by  maximum  entropy  estimation  using  exponential 
models  for  d(u). 

In  Parzen  (1979)  we  introduced  the  autoregressive  method  of  estimating  d(u)  which 
has  other  close  connections  to  entropy  statistics  for  goodness  of  fit.  Raw  estimators  cr(u) 
and  ao~  are  formed  by  replacing  g(u)  by  a  least  smooth  gap  estimator  g2'^{u).  Smooth 
estimators  dm‘'{u)  are  formed  by  the  autoregressive  method. 

From  estimators  p~{v)  of  the  pseudo-correlations 

p(v)  =  f  e^^^^d{u)du,  V  =  0,  ±1, . . . ,  ±m 

Jo 

one  estimates  (using  suitable  Yule- Walker  equations)  the  coefficients  of  the  autoregressive 
order  m  approximator 

<rm(u)  =  K„-\l  +  +  . . .  +  a„-(m)e2"“”'r2 

to  the  raw  density  <r[u).  The  coefficient  Km"  plays  an  important  role  in  entropy  calcula¬ 
tions  since 

/  -  log  dm'{u)du  =  -  log  Km^ 

Jo 

can  be  regarded  as  an  estimator  of  Jq  —]ogd~(u)du,  and  thus  is  an  entropy  difference 
statistic  for  goodness  of  fit. 

7.  Entropy  difference  interpretation  of  Shapiro  Wilk  statistic.  To  test  the 
hypothesis  Hq  :  X  is  a  test  statistic  IV  of  Shapiro-Wilk  type  is  of  the  form 

W  =  a* /a" 
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where  a"  is  the  sample  standard  deviation  and 


is  an  zisymptotically  efficient  estimator  of  a  based  on  linear  combinations  of  the  order 
statistics  X{j',n)  of  the  random  sample.  The  first  step  in  the  entropy  interpretation  of  W 
is  to  consider  instead  the  statistic 

-logf^  =  log logo*  = 

—  log  W  is  an  entropy  difference  statistic,  but  it  compares  two  parametric  estimators  of 
entropy  bcised  on  two  approaches  to  estimating  parameters  which  are  both  efficier «  under 
the  null  hypothesis  of  normality. 

8.  Comparison  of  95%  significance  levels  for  small  samples.  Significance 
levels  for  the  entropy-difference  statisic  Ajy  =  —  log  W  are  obtainable  from  tables  of  the 
W  statistic  [for  example,  Filliben  (1975)].  An  example  of  95%  significance  levels  (for 
accepting  normality)  are 

Aw  <  0.05,  for  sample  size  n  =  20; 

Aw  S  0.023,  for  sample  size  n  =  50. 

The  various  entropy  difference  statistics  can  be  compared  by  their  significance  levels 
(see  Table).  Significance  levels  of  autoregressive  goodness  of  fit  statistics  —  log if rn"  have 
been  derived  by  a  very  approximate  Monte  Carlo  simulation  (in  the  case  of  testing  for 
normality).  An  open  research  problem  is  investigation  of  an  Akaike-type  criterion  for 
accepting  the  null  hypothesis  that  X  is  Fo((i  —  h)Icf),  such  as: 

{2m jn)  +  log  KrrC  >  0  for  m  =  1,2, _ 

Significance  levels  of  Vasicek  (1977)  statistic  Ay,  defined  in  section  5,  are  based  on 
Monte  Carlo  simulation  of  normal;  significance  levels  of  similar  Dudewicz-van  der  Muelen 
(1981)  Ay  =  to  test  uniformity  statistic  are  based  on  Monte  Carlo  simulation  of 

uniform. 
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One  can  conjecture  a  relation  between  gap  order  2u  and  autoregressive  order  m  for  the 
corresponding  estimators  to  have  similar  distributions  and  therefore  similar  significance 
levels: 

{2u)m  =  n  =  sample  size 

To  understand  what  this  conjecture  is  alleging  note  that  for  n  =  20,  m  =  4  is  similar  to 
2u  =  6;  for  n  =  50,  m  =  6  is  similar  to  2i/  =  8.  When  one  uses  gap  estimators  of  q{u), 
and  thus  of  entropy,  one  has  the  problem  of  determining  the  order  2i/.  One  may  be  able 
to  more  ea;sily  develop  criteria  for  determining  the  order  m  of  autoregressive  estimators  of 
q{u). 

Our  modified  Moran  statistic  M*{6)  can  be  compared  by  noting  that  it  heis  95% 
significance  level  .48  for  n  =  20.  Significance  level  appears  to  increase  as  amount  of 
smoothing  of  “hidden”  density  d[u)  decreases.  Investigating  this  phenomenon  is  a  good 
topic  for  future  research. 
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Table.  95%  SIGNIFICANCE  LEVELS  FOR  ENTROPY  DIFFERENCE  STATISTICS.  Accept  Ho  :  X  is 
N{h,(7^)  for  some  /i  and  cr  if  entropy  difference  is  less  than  threshold  given. 


-log  IV  -log/f^*  Av 

Autoregressive  order  m  ^(gap  estimator  gj^"(u)) 

Sample  Shapiro-  Monte  Carlo  5%  level  Vasicek  test  for  normality 

Size  n  Wilk  (rough  approximation  2m/n)  (Dudewicz-van  der  Muelen)  test  17(0, 1) 


m  =  1 

m  =  2 

m  =  3 

m  =  4 

m  =  5 

1/  =  5  u  =  4 

1/  =  3 

t/  =  2 

1/  =  1 

20 

.05 

.141 

,235 

.299 

.378 

.398 

.40 

.61 

(.10) 

(.20) 

(.30) 

(.40) 

(.50) 

(.43 

.66) 

50 

.023 

.045 

.081 

.126 

.153 

.176 

.21  .21 

.23 

(.04) 

(.08) 

(.12) 

(.26) 

(.20) 

(.22  .22 

.24) 
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